15 research outputs found

    Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    Full text link
    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

    ARIANNA: A research environment for neuroimaging studies in autism spectrum disorders

    Get PDF
    The complexity and heterogeneity of Autism Spectrum Disorders (ASD) require the implementation of dedicated analysis techniques to obtain the maximum from the interrelationship among many variables that describe affected individuals, spanning from clinical phenotypic characterization and genetic profile to structural and functional brain images. The ARIANNA project has developed a collaborative interdisciplinary research environment that is easily accessible to the community of researchers working on ASD (https://arianna.pi.infn.it). The main goals of the project are: to analyze neuroimaging data acquired in multiple sites with multivariate approaches based on machine learning; to detect structural and functional brain characteristics that allow the distinguishing of individuals with ASD from control subjects; to identify neuroimaging-based criteria to stratify the population with ASD to support the future development of personalized treatments. Secure data handling and storage are guaranteed within the project, as well as the access to fast grid/cloud-based computational resources. This paper outlines the web-based architecture, the computing infrastructure and the collaborative analysis workflows at the basis of the ARIANNA interdisciplinary working environment. It also demonstrates the full functionality of the research platform. The availability of this innovative working environment for analyzing clinical and neuroimaging information of individuals with ASD is expected to support researchers in disentangling complex data thus facilitating their interpretation

    Analisi del decadimento W -> tau nu in CMS a LHC

    No full text
    Questo lavoro di tesi si e' svolto nell'ambito dell'esperimento CMS a LHC, ed in particolare verte sullo studio delle strategie di identificazione off-line del leptone tau, atteso tra i prodotti di decadimento del bosone di Higgs, cosi' come di altre particelle previste in altri modelli teorici. Il canale utilizzato per testare la procedura di identificazione del tau e' il decadimento semileptonico del bosone vettore W. Infine, sulla base dei risultati ottenuti viene presentata una stima quantitativa della sezione d'urto di produzione pp-> W +

    Analisi del decadimento W>τνW -> \tau \nu in CMS a LHC

    No full text
    Questo lavoro di tesi si e' svolto nell'ambito dell'esperimento CMS a LHC, ed in particolare verte sullo studio delle strategie di identificazione off-line del leptone tau, atteso tra i prodotti di decadimento del bosone di Higgs, cosi' come di altre particelle previste in altri modelli teorici. Il canale utilizzato per testare la procedura di identificazione del tau e' il decadimento semileptonico del bosone vettore W. Infine, sulla base dei risultati ottenuti viene presentata una stima quantitativa della sezione d'urto di produzione pp-> W +

    Studies on nVidia GPUs in parallel computing for Lattice QCD and Computational Fluid Dynamics applications

    No full text
    Over the last 20 years, the computing revolution has created many social benefits. The computing energy and environmental footprint have grown, and as a consequence the energy efficiency is becoming increasingly important. The evolution toward an always-on connectivity is adding demands for efficient computing performances. The result is a strong market that pulls for technologies that improve processor performance while reducing energy use. The improvements in energy performance have largely come as a side effect of the Moore’s law – the number of transistors on a chip doubles about every two years, thanks to an ever-smaller circuitry. An improvement in better performance and in energy efficiency is due to more transistors on a single computer, with less physical distance between them. In the last few years, however, the energy-related benefits resulting from the Moore’s law are slowing down, threatening future advances in computing. This is caused by the reaching of a physical limit in the miniaturization of transistors. The industry’s answer to this problem for now are new processors architecture and more power-efficient technologies. For decades, the Central Processing Unit (CPU) of a computer has been the one designated to run general programming tasks, excelling at running computing instructions serially, and using a variety of complex techniques and algorithms in order to improve speed. Graphics Processing Units (GPUs) are specialized accelerators originally designed for painting millions of pixels simultaneously across a screen, doing this by performing parallel calculation using simpler architecture. In recent years the video game market developments compelled GPUs manufacturers to increase the floating-point calculation performance of their products, by far exceeding the performance of standard CPUs in floating point calculations. The architecture evolved toward programmable manycore chips that are designed to process in parallel massive amounts of data. These developments suggested the possibility of using GPUs in the field of High-Performance Computing (HPC) as low-cost substitutes of more traditional CPU-based architectures: nowadays such possibility is being fully exploited and GPUs represent an ongoing breakthrough for many computationally demanding scientific fields, providing consistent computing resources at relatively low cost, also in terms of power consumption (watts/flops). Due to their many-core architectures, with fast access to the on-board memory, GPUs are ideally suited for numerical tasks allowing for data parallelism, i.e., for Single Instruction Multiple Data (SIMD) parallelization. In this thesis, the parallel computing in Lattice Quantum Chromodynamics (Lattice QCD or LQCD) and in Computing Fluid Dynamics (CFD) using multi-GPU systems is presented, highlighting the approach of the software in each case, and trying to understand how to build and how to exploit the next generation clusters for scientific computing. In Chapter 1, the fundamentals of parallel computing are enunciated and a description of the hardware architecture of the main multiple-processor systems is provided. The use of GPUs for general purpose parallel computing is presented in Chapter 2, comparing them to traditional CPUs. Moreover, a brief history of GPU devices is presented, highlighting the evolution from the ones exclusively dedicated to graphics applications to the DirectX 8 generation, the first that can be fully dedicated to general purpose applications. In Chapter 3, the Compute Unified Device Architecture (CUDA) is described. This is a parallel computing platform and application programming interface (API) model created by nVidia, that allows software developers to use nVidia GPUs (the so called CUDA-enabled one) for general purpose processing. Chapter 4 described the GPUs approach to the Lattice QCD field. Two different approaches to exploit GPU computing are presented – through CUDA and OpenACC – where an existing code has been adapted to fully take advantage of accelerator devices such as GPUs. Performances comparison is shown and the first studies for the implementation of complete simulation of a Lattice QCD code in a multi-GPUs system are presented. A different application field is analyzed in Chapter 5, where a multi-GPU system is tested and optimized for CFD purpose, using a proprietary software such as ANSYS Fluent

    Preserving access to ALEPH computing environment via virtual machines

    No full text
    The ALEPH Collaboration [1] took data at the LEP (CERN) electron-positron collider in the period 1989-2000, producing more than 300 scientific papers. While most of the Collaboration activities stopped in the last years, the data collected still has physics potential, with new theoretical models emerging, which ask checks with data at the Z and WW production energies. An attempt to revive and preserve the ALEPH Computing Environment is presented, the aim is not only the preservation of the data files (usually called bit preservation), but of the full environment a physicist would need to perform brand new analyses. Technically, a Virtual Machine approach has been chosen, using the VirtualBox platform. Concerning simulated events, the full chain from event generators to physics plots is possible, and reprocessing of data events is also functioning. Interactive tools like the DALI event display can be used on both data and simulated events. The Virtual Machine approach is suited for both interactive usage, and for massive computing using Cloud like approaches

    Designing and Optimizing LQCD codes using OpenACC

    No full text
    An increasing number of massively parallel machines adopt heterogeneous node architectures combining traditional multicore CPUs with energy-efficient and fast accelerators. Programming heterogeneous systems can be cumbersome and designing efficient codes often becomes a hard task. The lack of standard programming frameworks for accelerator based machines makes it even more complex; in fact, in most cases satisfactory performance implies rewriting the code, usually written in C or C++, using proprietary programming languages such as CUDA. OpenACC offers a different approach based on directives. Porting applications to run on hybrid architectures “only” requires to annotate existing codes with specific “pragma” instructions, that identify functions to be executed on accelerators, and instruct the compiler on how to structure and generate code for specific target device. In this talk we present our experience in designing and optimizing a LQCD code targeted for multi-GPU cluster machines, giving details of its implementation and presenting preliminary results

    Development of scientific software for HPC architectures using open ACC: The case of LQCD

    No full text
    Many scientific software applications, that solve complex compute-or data-intensive problems, such as large parallel simulations of physics phenomena, increasingly use HPC systems in order to achieve scientifically relevant results. An increasing number of HPC systems adopt heterogeneous node architectures, combining traditional multi-core CPUs with energy-efficient massively parallel accelerators, such as GPUs. The need to exploit the computing power of these systems, in conjunction with the lack of standardization in their hardware and/or programming frameworks, raises new issues with respect to scientific software development choices, which strongly impact software maintainability, portability and performance. Several new programming environments have been introduced recently, in order to address these issues. In particular, the Open ACC programming standard has been designed to ease the software development process for codes targeted for heterogeneous machines, helping to achieve code and performance portability. In this paper we present, as a specific Open ACC use case, an example of design, porting and optimization of an LQCD Monte Carlo code intended to be portable across present and future heterogeneous HPC architectures, we describe the design process, the most critical design choices and evaluate the trade off between portability and efficiency

    Portable LQCD Monte Carlo code using OpenACC

    No full text
    Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs
    corecore